Measuring the convergence of Monte Carlo free energy calculations 
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The nonequilibrium work fluctuation theorem provides the way for calculations of (equilibrium) 
free energy based on work measurements of nonequilibrium, finite-time processes and their reversed 
counterparts by applying Bennett's acceptance ratio method. A nice property of this method is that 
each free energy estimate readily yields an estimate of the asymptotic mean square error. Assuming 
convergence, it is easy to specify the uncertainty of the results. However, sample sizes have often 
to be balanced with respect to experimental or computational limitations and the question arises 
whether available samples of work values are sufficiently large in order to ensure convergence. Here, 
we propose a convergence measure for the two-sided free energy estimator and characterize some of 
its properties, explain how it works, and test its statistical behavior. In total, we derive a convergence 
criterion for Bennett's acceptance ratio method. 



PACS numbers: 02.50.Fz, 05.40.-a, 05.70.Ln 
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I. INTRODUCTION 

Many methods have been developed in order to es- 
timate free energy differences, ranging from thermody- 
namic integration fll Q , path sampling H , free en- 
ergy perturbation [J| , umbrella sarnpling [5|-l7| , adiabatic 
switching Q , dynamic methods [ol-H^ , optimal protocols 
[T3 | , asymptotic tails fisl, t o targeted and escorted 
free energy perturbation |16l42C|. Yet, the reliability and 
efficiency of the approaches have not been considered in 
full depth. Fundamental questions remain unanswered 
[2l| . e.g., what method is best for evaluating the free en- 
ergy? Is the free energy estimate reliable and what is the 
error in it? How can one assess the quality of the free 
energy result when the true answer is unknown? Gener- 
ically, free energy estimators are strongly biased for fi- 
nite sample sizes, such that the bias constitutes the main 
source of error of the estimates. Moreover, the bias can 
manifest itself in a seemingly convergence of the calcula- 
tion by reaching a stable value, although far apart from 
the desired true value. Therefore, it is of considerable in- 
terest to have reliable criteria for the convergence of free 
energy calculations. 

Here we focus on the convergence of Bennett's accep- 
tance ratio method. Thereby, we will only be concerned 
with the intrinsic statistical errors of the method and as- 
sume uncorrelated and unbiased samples from the work 
densities. For incorporation of instrument noise, see Ref. 

m. 

With emerging results from nonequilibrium stochas- 
tic thermodynamics, Bennett's acceptance ratio method 
[23l - |26j has revived actual interest. 

Recent research has shown that the isothermal free en- 



ergy difference A/ = /i — /o of two thermal equilib- 
rium states and 1, both at the same temperature T, 
can be determined by externally driven nonequilibrium 
processes connecting these two states. In particular, if 
we start the process with the initial thermal equilibrium 
state and perturb it towards 1 by varying the control 
parameter according to a predefined protocol, the work w 
applied to the system will be a fluctuating random vari- 
able distributed according to a probability density pq{w). 
This direction will be denoted with forward. Reversing 
the process by starting with the initial equilibrium state 
1 and perturbing the system towards by the time re- 
versed protocol, the work w done hy the system in the 
reverse process will be distributed according to a density 
Pi{w). Under some quite general conditions, the forward 
and reverse work densities po{w) and pi{w) are related 
to each other by Crooks fluctuation theorem [S^, [28j 



Po(w) 
pi{w) 



(1) 
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Throughout the paper, all energies are understood to be 
measured in units of the thermal energy kT, where k 
is Boltzmann's constant. The fluctuation theorem re- 
lates the equilibrium free energy difference A/ to the 
nonequilibrium work fluctuations which permits calcu- 
lation (estimation) of A/ using samples of work-values 
measured either in only one direction {one-sided estima- 
tion) or in both directions {two-sided estimation). The 
one-sided estimators rely on the Jarzynski relation [2^ 
e^'^^ = J e~'^po{'w)d'w which is a direct consequence of 
Eq. ([T|), and the free energy is estimated by calculating 
the sample mean of the exponential work. In general, 
however, it is of great advantage to employ optimal two- 
sided estimation with Bennett's acceptance ratio method 
[23} , although one has to measure work- values in both di- 
rections. 

The work fluctuations necessarily allow for events 
which "violate" the second law of thermodynamics such 
that w < A/ holds in forward direction and w > A/ in 
reverse direction, and the accuracy of any free energy es- 
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timate solely based on knowledge of Eq. ([T]) will strongly 
depend on the extend to which these events are observed. 
The fluctuation theorem indicates that such events will 
in general be exponentially rare; at least, it yields the in- 
equality (w)-^ < Af < {w)q [29]], which states the second 
law in terms of the average work (w)o and {w) ^ in forward 
and reverse direction, respectively. Reliable free energy 
calculations will become harder the larger the dissipated 
work {w)q — Af and Af — (w)-^ in the two directions is 
(201 ■ i.e. the farther from equilibrium the process is car- 
ried out, resulting in an increasing number A'^ of work 
values needed for a converging estimate of Af. This dif- 
ficulty can also be expressed in terms of the overlap area 
A ^ J m.m{po{w),pi{w)}dw < 1 of the work densities, 

which is just the sum of the probabilities J^^Padw and 
J^j: Pidw of observing second-law "violating" events in 
the two directions. Hence, N has to be larger than 1/A. 
However, an a priori determination of the number N of 
work values required will be impossible in situations of 
practical interest. Instead, it may be possible to deter- 
mine d posteriori whether a given calculation of Af has 
converged. The present paper develops a criterion for the 
convergence of two-sided estimation which relies on mon- 
itoring the value of a suitably bounded quantity a, the 
convergence measure. As a key feature, the convergence 
measure a checks if the relevant second-law "violating" 
events are observed sufficiently and in the right propor- 
tion for obtaining an accurate and precise estimate of 
A/. ^ 

Two-sided free energy estimation, i.e. Bennett's accep- 
tance ratio method, incorporates a pair of samples of 
both directions: given a sample {w^} of no forward work 
values, drawn independently frompo(w): together with a 
sample {wf} of ni reverse work values drawn frompi(w), 

the asymptotically optimal estimate Af of the free en- 
ergy difference Af is the unique solution of f23l-[26j 
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asymptotically vanishes for A'' — >■ 00, and the estimator 
is the one with least mean square error (viz. variance) in 
the limit of large sample sizes no and ni within a wide 
class of estimators. In fact, it is the optimal estimator if 
no further knowledge on the work densities besides the 
fluctuation theorem is given [20|, [l^l . It comprises one- 
sided Jarzynski estimators as limiting cases for a — > 
and a — >■ 1, respectively. Recently [S^l, the asymptotic 
mean square error has been shown to be a convex func- 
tion of a for fixed N, indicating that typically two-sided 
estimation is superior if compared to one-sided estima- 
tion. 

In the limit of large N , the mean square error 



= ((A/-A/)^ 



converges to its asymptotics 

X{N,a) ^ ^ 
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where the overlap (integral) Ua is given by 



PoPi 



apo + l3p-_ 



■dw. 



(5) 



(6) 



(7) 



Likewise, in the large N limit the probability density of 
the estimates Af (for fixed N and a) converges to a 
Gaussian density with mean Af and variance X{N,a) 
[23 |. Thus, within this regime a reliable confidence in- 
terval for a particular estimate Af is obtained with an 
estimate X{N,a) of the variance, 



XiN,a) := 



1 



1 



(8) 



where the overlap estimate Ua is given through 



Ua 
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(9) 



where a and (3 G (0,1) are the fraction of forward and 
reverse work values used, respectively. 
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f3 



iV' 



(3) 



with the total sample size N = uq + ni. 

Originally found by Bennett [1^ in the context of free 
energy perturbation [3], with "work" being simply an 
energy difference, the two-sided estimator ([2]) was gener- 
alized by Crooks (30| to actual work of nonequilibrium 
finite time processes. We note that the two-sided esti- 
mator has remarkably good properties [13, [H, [H, [sij . 
Although in general biased for small sample sizes N, the 
bias 



6= (A/- A/ 



(4) 



To get some feeling for when the large N limit "be- 
gins", we state a close connection between the asymp- 
totic mean square error and the overlap area A of the 
work densities as follows: 

see Appendix |^ Using a « 0.5 and assuming that the 
estimator has converged once X < 1, we find the "onset" 
of the large N limit for N > -j. However, this onset may 
actually be one or more orders of magnitude larger. 

If we do not know whether the large N limit is reached, 
we cannot state a reliable confidence interval of the free 
energy estimate: a problem which encounters frequently 
within free energy calculations is that the estimates "con- 
verge" towards a stable plateau. While the sample vari- 
ance can become small, it remains unclear whether the 
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FIG. 1: Displayed are free energy estimates A/ in dependence 
of the sample size A^, reaching a seemingly stable plateau if A'' 
is restricted to A'^ = 1000 (top panel). Another stable plateau 
is reached if the sample size is increased up to A'^ = 100 000 
(bottom panel). Has the estimate finally converged? The 
answer is given by the corresponding graph of the conver- 
gence measure a which is shown in the inset. The fluctuations 
around zero indicate convergence. The exact value of the free 
energy difference is visualized by the dashed horizontal line. 



happens almost simultaneously with the convergence of 
A/ to A/. The procedure is as follows: While drawing 
an increasing number of work values in both directions 
(with fixed fraction a of forward draws), successive esti- 
mates A/ and corresponding values of a, based on the 
present samples of work, are calculated. The values of a 
are displayed graphically in dependence of iV, preferably 
on a log-scale. Then the typical situation observed is 
that a is close to it's upper bound for small sample sizes 
iV < which indicates lack of "rare events" which are 
required in the averages of Eq. ([2]) (i.e. those events which 
"violate" the second law). Once N becomes compara- 
ble to ^ , single observations of rare events happen and 

change the value of A/ and a rapidly. In this regime of 
N, rare events are likely to be observed either dispropor- 
tionally often or seldom, resulting in strong fluctuations 
of a around zero. This indicates the transition region to 
the large N limit. Finally, for some N ^ ^, the large N 
limit is reached, and a typically fluctuates close around 
zero, cf. the inset of Fig. [T] 

The paper is organized as follows. In Sec. [TTl we first 
consider a simple model for the source of bias of two-sided 
estimation which is intended to obtain some insight into 
the convergence properties of two-sided estimation. The 
convergence measure a, which is introduced in Sec. IIIIl 
however, will not depend on this specific model. As the 
convergence measure is based on a sample of forward and 
reverse work values, it is itself a random variable, raising 
the question of reliability once again. Using numerically 
simulated data, the statistical properties of the conver- 
gence measure will be elaborated in Sec. lIVI The conver- 
gence criterion is stated in Sec. |Vl and Sec. IVII presents 
an application to the estimation of the chemical potential 
of a Lennard-Jones fluid. 



reached plateau represents the correct value of A/. Pos- 
sibly, the found plateau is subject to some large bias, i.e. 
far off the correct value. A typical situation is displayed 
in Fig. [T] which shows successive two-sided free energy 
estimates in dependence of the sample size N. The er- 
rorbars are obtained with an error-propagation formula 
for the variance of A/ which reflects the sample vari- 
ances, see Appendix [Cl after reading Sec. IIIIl If we take 
a look on the top panel of Fig. [TJ we might have the 
impression that the free energy estimate has converged 
at « 300 already, while the bottom panel reaches out 
to larger sample sizes where it becomes visible that the 
"convergence" in the top panel was just pretended. Fi- 
nally, we may ask if the estimates shown in the bottom 
panel have converged at A^ > 10000? As we know the 
true value of A/, which is depicted in the figure as a 
dashed line, we can conclude that convergence actually 
happened. 

The main result of the present paper is the statement of 
a convergence criterion for two-sided free energy estima- 
tion in terms of the behavior of the convergence measure 
a. As will be seen, a converges to zero. Moreover, this 



II. NEGLECTED TAIL MODEL FOR 
TWO-SIDED ESTIMATION 

To obtain some first qualitative insight into the relation 
between the convergence of Eq. ^ and the bias of the 
estimated free energy difference, we adopt the neglected 
tails model [s^ originally developed for one-sided free 
energy estimation. 

Two-sided estimation of A/ essentially means estimat- 
ing the overlap Ua from two sides, however in a depen- 
dent manner, as A/ is adjusted such that both estimates 
are equal in Eq. (0). 

Consider the (normalized) overlap density pa{w), de- 
fined as harmonic mean of pq and pi : 



I N 1 Vq{w)pi(w) 



Ua apo{w) + (3pi{w) 



(11) 



For Q — > and a — >■ 1, Pa converges to po and pi, re- 
spectively. The dominant contributions to Ua come from 
the overlap region of po and pi where pa has its main 
probability mass, see Fig. [2] (top). 
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FIG. 2: Schematic diagram of reverse pi, overlap p^, and for- 
ward po work densities (top). Schematic histograms of finite 
samples from po and pi , where in particular the latter is im- 
perfectly sampled, resulting in a biased estimate A/ of the 
free energy difference (bottom). 



depending on no and ni) and the expectation i^^fj is 
understood to be the unique root of Eq. p^ . thus being 
a function of the cut-off values wl, i = 0, 1. 

In order to elaborate the implications of this model, we 
rewrite Eq. (IT^ with the use of the fluctuation theorem 
(P) such that the integrands are equal. 



/ 

,(A/-A/) ^ -oo 



dw 



I 



Paj-u]) 



(13) 



dw 



and consider two special cases: 



1. Large ni limit: Assume the sample size ni is large 
enough to ensure that the overlap region is fully 
and accurately sampled (large ni limit). Thus, 
can be safely set equal to oo in Eq. ([T3|. and the 
r.h.s. becomes larger than unity. Accordingly, our 
model predicts a positive bias. 

2. Large hq limit: Turning the tables and using wj? = 
—oo in Eq. (|13l) . the model implies a negative bias. 



In essence, i^^f/ is shifted away from A/ towards the 
insufficiently sampled density. In general, when none of 
the densities is sampled sufficiently, the bias will be a 
trade off between the two cases. 

Qualitatively, from the neglected tails model, we find 
the main source of bias resulting from a different con- 
vergence behavior of forward and reverse estimates ([9]) 
of Ua ■ The task of the next section will be to develop a 
quantitative measure of convergence. 



In order to obtain an accurate estimate of A/ with the 
two-sided estimator ([2]), the sample {w^} drawn from po 
has to be representative for po up to the overlap region 
in the left tail oi po, and the sample [w].} drawn from pi 
has to be representative for pi up to the overlap region 
in the right tail oi pi. For small uq and ni, however, we 



will have certain effective cut-off values 



and 



for 



the samples from po and pi , respectively, beyond which 
we typically will not find any work values, see Fig. [2] 
(bottom). 

We introduce a model for the bias (|4|) of two-sided free 
energy estimation as follows. Assuming a "semi-large" 
N = no + rii, the effective behavior of the estimator for 
fixed no and ni is modeled by substituting the sample 
averages appearing in the estimator ([2]) with ensemble 
averages, however truncated at w'^ and wl, respectively: 



Po{w) 



(A/) 



dw 



pi{w) 



+ ^e~^H^f) 



dw. (12) 



Thereby, the cut-off values are thought fixed (only 



III. THE CONVERGENCE MEASURE 

In order to check convergence, we propose a measure 
which relies on a consistency check of estimates based 
on first and second moments of the Fermi functions that 
appear in the two-sided estimator (j9l). In a recent study 
[20], we already used this measure for the special case 
of a = ^. Here, we give a generalization to arbitrary 
a, study the convergence measure in greater detail, and 
justify its validity and usefulness. In the following we 
will assume that the densities po and pi have the same 
support. 

It was discussed in the preceding section that the large 
N limit is reached and hence the bias of two-sided esti- 
mation vanishes if the overlap Ua is (in average) correctly 
estimated from both sides, and 1. Defining the com- 
plementary Fermi functions tc{w) and bc{w) (for given 
a) with 



tc{w) 
bc{w) = 



1 



-W-\-C ' 



(14) 



such that atc{w) + phdw) = 1 and tc{w) = e^~'^hc{w) 
holds. The overlap ([7]) can be expressed in terms of first 
moments, 



Ur.^ t 



Ajiw)piiw)dw ^ bAf{w)po{w)dw, (15) 



and the overlap estimate Ua, Eq. ([9]), is simply obtained 
by replacing in Eq. (|15p the ensemble averages by sample 
averages, 



(16) 



According to Eq. ([2]), the value of A/ is defined such 
that the above relation holds. Note that A/ = 
A/(w]', . . . , w\^) is a single- valued function depending on 
all work values used in both directions. The overbar with 
index (i) denotes an average with a sample {w\} drawn 
from Pi, i — 0,1. For an arbitrary function g{w) it ex- 
plicitly reads 



1 



(17) 



fe=i 



Interestingly, Ua can be expressed in terms of second 
moments of the Fermi functions such that it reads 



Ua — a tj^jPidw + /3 



b'ifPodw 



(18) 



A useful test of self-consistency is to compare the first 
order estimate Ua, with the second order estimate t/^"', 
where the latter is defined by replacing the ensemble av- 
erages in Eq. (fT8| with sample averages: 



(19) 



Thereby, the estimates A/, Ua, and Uj^"\ are understood 
to be calculated with the same pair of samples {wj?} and 

The relative difference of this comparison results in the 
definition of the convergence measure, 



Ua - L/r' 

Ua ' 



(20) 



for all a € (0,1). Clearly, in the large TV limit, a will 
converge to zero, as then Af converges to Af and thus 
Ua as well as C/^"' converge to Ua ■ As argued below, it is 
the estimate J/^"' that converges last, hence a converges 
somewhat later than Af. 

Below the large N limit, a will deviate from zero. From 
the general inequality 



[/„ < < 2Ua 



(21) 



(see Appendix [B|) follow upper and lower bounds on a 
which read 



-1 < a < 1 - C/a < 1. 



(22) 
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FIG. 3: Schematic plot which shows that the forward work 
density, po{w), samples the Fermi function 6a/ (w) = l/{j3 + 
^gUj-AZ-j gQjjjg^ija^^; earlier than its square. 



The behavior of a with increasing sample size N = no+ni 
(while keeping the fraction a = 77- constant) can roughly 
be characterized as follows: a "starts" close to its upper 
bound for small N and decreases towards zero with in- 
creasing N. Finally, a begins to fluctuate around zero 
when the large N limit is reached, i.e. when the estimate 
Af converges. 

To see this qualitatively, we state that the second or- 
der estimate t/^"' converges later than the first order 
estimate Ua, as the former requires sampling the tails 
of Po and pi to a somewhat wider extend than the lat- 
ter, cf. Fig. [21 For small TV, both, Ua and U^">, wih 
typically underestimate Ua, as the "rare-events" which 
contribute substantially to the averages and (fT9)) 
are quite likely not to be observed sufficiently, if at all. 
For the same reason, generically C/^"' < Ua will hold, 
since bsj{w^)'^ < bsj{w^) holds for > Af and similar 

tsfiw^)"^ < t^{w^) for < Af. Therefore, a is typi- 
cally positive for small N. In particular, if N is so small 
that all work values of the forward sample are larger 
than Af and all work values of the reverse sample are 
smaller than Af, then U^"^ becomes much smaller than 
Ua, resulting in a ~ 1. 

Analytic insight into the behavior of a for small N re- 
sults from the fact that nx^ > x"^ for any set {xi, . . . Xn} 
of positive numbers Xk. Using this in Eq. ([T^ yields 



and 



i7<"> < 2Nal3Ul 



1 - 2a/3NUa <a<l-Ua. 



(23) 



(24) 



This shows that as long as NUa ^ 1 holds, a is close to 
its upper bound 1 — ~ 1. In particular, if a = ^ and 

N — 2, then a = 1 — Ua holds exactly. 

Averaging the inequality for some N sufficiently large 

to ensure (a) ~ and ^^q^ ~ ^a, we get a lower bound 

on N which reads N > jS^iT- Again, this bound can 

be related to the overlap area A: taking a = ^ and 
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FIG. 4: Exponential (left panel) and Gaussian (right panel) 
work densities. 



using Ui < 2 A (see Appendix |A|). we obtain iV > in 
concordance with the lower bound for the large N limit 
stated in Sec. HI 

Last we note that the convergence measure a can also 
be understood as a measure of the sensibility of relation 
([2]) with respect to the value of A/: in the low N regime, 
the relation is highly sensible to the value of A/, result- 
ing in large values of a, whereas in the limit of large N, 
relation ([2]) becomes insensible to small perturbations of 

A/, corresponding to a w 0. The details are summarized 
in Appendix [D] 



IV. STUDY OF STATISTICAL PROPERTIES OF 
THE CONVERGENCE MEASURE 




100 1000 10000 
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In order to demonstrate the validity of a as a mea- 
sure of convergence of two-sided free energy estimation, 
we apply it to two qualitatively different types of work 
densities, namely exponential and Gaussian, see Fig. SI 
Samples from these densities are easily available by stan- 
dard (pseudo) random generators. Statistical properties 
of a are obtained by means of independent repeated cal- 
culations of A/ and a. While the two types of densi- 
ties used are fairly simple, they are entirely different and 
general enough to reflect the statistical properties of the 
convergence measure. 

A. Exponential work densities 

The first example uses exponential work densities, i.e. 

Pi{w)^—e~~, w>0, (25) 

/ii > 0, i = 0,1. According to the fluctuation theorem 
©, the mean values fii of po a-nd pi are related to each 
other, fii — , and the free energy difference is known 
to be A/ = ln(l + /io). 

Choosing /xq = 1000 and a = i, i.e. rip = ni, we 
calculate free energy estimates A/ according to Eq. ^ 
together with the corresponding values of a according to 
Eq. (I^ni) for different total sample sizes N = no + ni. An 



FIG. 5: Statistics of two-sided free energy estimation (ex- 
ponential work densities): shown are averaged estimates of 
A/ in dependence of the total sample size N. The errorbars 
reflect the standard deviation. The dashed line shows the ex- 
act value of A/, and the inset the details for large N (top). 
Statistics of the convergence measure a corresponding to the 
estimates of the top panel: shown are the average values of a 
together with their standard deviation in dependence of the 
sample size N. Note the characteristic convergence of a to- 
wards zero in the large N limit (bottom). 



example of a single running estimate and the correspond- 
ing values of the convergence measure are depicted in 
Fig.m Ten-thousand repetitions for each value of N yield 
the results presented in Figs.lSHTOl To begin with, the top 
panel of Fig. [5] shows the averaged free energy estimates 
in dependence of N, where the errorbars show ± the esti- 
mated square root of the variance (^(A/ — (A/))^^ For 

small N, the bias (^^f ~ '-'^ ^^^^ energy estimates is 
large, but becomes negligible compared to the standard 
deviation for N > 5000. This is a prerequisite of the 
large N limit, therefore we will view N sa 5000 as the 
onset of the large N limit. 

The bottom panel of Fig. [S] shows the averaged values 
of the convergence measure a corresponding to the free 
energy estimates of the top panel. Again, the errorbars 
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100 1000 
N 

FIG. 6: Mean values of overlap estimates Ua and U^'^ of first 
and second order, respectively. The slightly slower conver- 
gence of Ua''' towards Ua results in the characteristic prop- 
erties of the convergence measure a. To enhance clarity, data 
points belonging to the same value of A*' are spread. 
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FIG. 8: The average convergence measure (a) plotted against 

the corresponding mean square error ^(A/ — A/)^^ of the 

two-sided free energy estimator. The inset shows an enlarge- 
ment for small values of (a). 




FIG. 7: (Color online) Double-logarithmic scatter plot of U^'^ 
versus Ua for many individual estimates in dependence of the 
sample size A'^. The dotted lines mark the exact value of Ua 
on the axes, and the dashed line is the bisectrix 17^"' = Ua- 
The approximatively linear relation between the logarithms of 
Ua'^ and Ua is continued up to the smallest observed values 
(< 10"^"", not shown here). 



are ± one standard deviation y {a?) — (a) , except that 
the upper limit is truncated for small A'^, as a < 1 holds. 
The trend of the averaged convergence measure (a) is 
in full agreement with the general considerations given 
in the previous section: for small N, (a) starts close to 
its upper bound, decreases monotonically with increas- 
ing sample size, and converges towards zero in the large 
N limit. At the same time, its standard deviation con- 
verges to zero, too. This indicates that single values of 




a 



FIG. 9: (Color online) A scatter plot of the deviation A/ — 
A/ versus the convergence measure a for many individual 
estimates in dependence of the sample size A'^. Note that the 
majority of estimates belonging to A'' = 32 and A'^ = 100 have 
large values of A/ — A/ well outside the displayed range with 
a being close to one. 



a corresponding to single estimates A/ will typically be 
found close to zero in the large N regime. 

Noting that a is defined as relative difference of the 
overlap estimators Ua and C/^"' of first and second or- 
der, respectively, we can understand the trend of the av- 
erage convergence measure by taking into consideration 

the average values (Uo^j and (u^'^^i which are shown 
in Fig. |6l For small sample sizes, Ua is typically under- 
estimated by both, Ua and f/^"', with C/^"' < Ua- 

The convergence measure takes advantage of the dif- 
ferent convergence times of the overlap estimators: C/^"' 
converges somewhat slower than Ua , ensuring that a ap- 
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FIG. 10: Estimated constrained probability densities 
p(Af\a < 0.9) (black) and p(A/|o > 0.9) (grayscale) for 
two different sample sizes A'^, plotted versus the deviation 
Af — Af. The inset shows averaged estimates of Af over 
the total sample size A'' subject to the constraints a > 0.9 
and a < 0.9, respectively. 

proaches zero right after A/ has converged. The large 
standard deviations shown as errorbars in Fig. [5] do not 
carry over to the standard deviation of a, because Ua and 
C/^"' sre strongly correlated, as is impressively visible in 
Fig. [T] The estimated correlation coefficient 

, ^ (26) 

^Var(;7r') Var(;7„) 

is about 0.97 (!) for the entire range of sample sizes N. 
In good approximation, Ua and Uj^'^ are related to each 
other according to a power law, JJ^"' ~ '^nU^'^ , where 
the exponent 7jv and the prefactor cjv depend on the 
sample size N (and a). We note that 7jv has a phase- 
transition-like behavior: for small iV, it stays approxi- 
mately constant near two; right before the onset of the 
large N limit, it shows a sudden switch to a value close 
to one where it finally remains. 

Figure [S] accents the decrease of the average (a) with 
decreasing mean square error ([5]) of two-sided estimation. 
The small N behavior is given by the upper right part of 
the graph, where (a) is close to its upper bound together 
with a large mean square error of Af. With increasing 
sample size, the mean square error starts to drop some- 
what sooner than (a), however, at the onset of the large 
N limit, they drop both and suggest a linear relation, 
as can be seen in the inset for small values of (a). The 
latter shows that (a) decreases to zero proportional to 
for large N (this is confirmed by a direct check, but not 
shown here). 



The next point is to clarify the correlation of single val- 
ues of the convergence measure with their corresponding 
free energy estimates. For this issue, figure |9] is most in- 
formative, showing the deviations A/— A/ in dependence 
of the corresponding values of a for many individual ob- 
servations. The figure makes clear that there is a strong 
relation, but no one-to-one correspondence between a 
and A/ — A/: For large A^, both a and A/— A/ approach 
zero with very weak correlations between them. However, 
the situation is different for small sample sizes N where 
the bias ~ ^•^) considerably large. There, the 
typically observed large deviations occur together with 
values of a close to the upper bound, whereas the atypi- 
cal events with small (negative) deviations come together 
with values of a well below the upper limit. Therefore, 
small values of a detect exceptional events if N is well be- 
low the large limit, and ordinary events if TV is large. 

To make this relation more visible, we split the esti- 
mates Af into the mutually exclusive events a > 0.9 
and a < 0.9. The statistics of the Af values within 
these cases are depicted in the inset of Fig. [TUl where 
normalized histograms, i.e. estimates of the constrained 
probability densities p{Af\a > 0.9) and p{Af\a < 0.9) 
are shown. The unconstrained probability density of 
Af can be reconstructed from a likelihood weighted 
sum of the constrained densities, p(Af) = p{Af\a > 
0.9)p„>o.9 -I- p{Af\a < 0.9)p„<o.9. The likelihood ratios 
read Pa>o.9/Pa<o.9 = 6.2 and 0.002 for N = 32 and 1000, 
respectively. Finally, the inset of Fig. [10] shows the av- 
erage values of constrained estimates Af over TV with 
errorbars of ± one standard deviation, in dependence of 
the condition on a. 

B. Gaussian work densities 

For the second example the work-densities are chosen 
to be Gaussian, 

Pi{w) = — =e , weR, (27) 

(TV27r 

i — 0, 1. The fluctuation theorem ([T]) demands both 
densities to have the same variance with mean val- 
ues fj,o = Af -\- and fXi = Af — itr^. Hence, po 
and pi are symmetric to each other with respect to A/, 
PQ{Af -\-w) = pi{Af — w). As a consequence of this sym- 
metry, the two-sided estimator with equal sample sizes 
riQ a-nd ni, i.e. a = 0.5, is unbiased for any N. However, 
this does not mean that the limit of large N is reached 
immediately. 

In analogy to the previous example, we proceed in pre- 
senting the statistical properties of a. Choosing a — Q 
and without loss of generality Af = 0, we carry out 10"* 
estimations of A/ over a range of sample sizes N . The 
forward fraction is chosen to be equal to a = 0.5, and for 
comparison, a = 0.999, and a = 0.99999, respectively. 
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FIG. 11: Gaussian work densities result in the displayed av- 
eraged estimates of A/. For comparison, three different frac- 
tions a of forward work values are used (top) . Average values 
of the convergence measure a corresponding to the estimates 
of the top panel (bottom). 



In the latter two cases, the two-sided estimator is biased 
for small N . We note that a = 0.5 is always the optimal 
choice for symmetric work densities which minimizes the 
asymptotic mean square error with respect to a [s^ . 

Comparing the top and the bottom panel of Fig. [TTl 
which show the statistics (mean value and standard de- 
viation as errorbars) of the observed estimates A/ and of 
the corresponding values of a, we find a coherent behavior 
for all three cases of a values. The trend of the average 
(a) shows in all cases the same features in agreement with 
the trend found for exponential work densities. 

As before, the characteristics of a are understood by 
the slower convergence of U^''' compared to that of [/„, 
as can be seen in Fig. [121 A scatter plot of [/^"' versus 
Ua looks qualitatively like Fig. [71 but is not shown here. 

Figure [T^l compares the average convergence measures 
as functions of the mean square error of A/ for the three 
values of a. For the range of small (a), all three curves 
agree and are linear. Again (a) decreases proportionally 
to jr for large N . Noticeable for small N is the shift 
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FIG. 12: Mean values of overlap estimates Ua and (74"' of 
first and second order {a = 0.5). 
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FIG. 13: The average convergence measure (a) plotted against 
the corresponding mean square error ^(A/ — Af)^\ of free 
energy estimates in dependence of A''. 



of (a) towards smaller values with increasing a. This 
results from the definition of a: the upper bound 1 — Ua 
of a tends to zero in the limits a 0, 1, as then J/q, — > 1. 

The relation of single free energy estimates A/ with 
the corresponding a values can be seen in the scatter 
plot of Fig. 1141 The mirror symmetry of the plot origi- 
nates from the symmetry of the work densities and the 
choice a = 0.5, i.e. of the unbiasedness of the two-sided 
estimator. Opposed to the foregoing example, the corre- 
lation between A/ — A/ and a vanishes for any value of 
N. Despite the lack of any correlation, the figure reveals 
a strong relation between the deviation A/ — A/ and the 
value of a: they converge equally to zero for large N. 

Last, figure [15] shows averages of constrained A/ esti- 
mates for the mutually exclusive conditions a > 0.9 and 
a < 0.9, now with a = 0.99999 in order to incorporate 
some bias. We observe the same characteristics as before, 
cf. the inset of Fig. [TOl The condition a < 0.9 filters the 
estimates A/ which are closer to the true value. 
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FIG. 14: (Color online) A scatter plot of the deviation A/ — 
A/ versus the convergence measure a for many individual 
estimates in dependence of the sample size N {a = 0.5). 
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FIG. 15: Averaged two-sided estimates of Af in dependence 
of the total sample size for the constraints a > 0.9, a 
unconstrained, and a < 0.9 (a = 0.99999). 



C. The general case 

The characteristics of the convergence measure are 
dominated by contributions of woA densities inside and 
near the region where the overlap density pa{w), Eq. (1111) , 
has most of its mass. We call this region the overlap re- 
gion. In the overlap region, the work densities may have 
one of the following characteristic relation of shape: 

1. Having their maxima at larger and smaller val- 
ues of work, respectively, the forward and reverse 
work densities both drop towards the overlap re- 
gion. Hence, any of both densities sample the over- 
lap region by rare events, only, which are responsi- 
ble for the behavior of the convergence measure. 

2. Both densities decrease with increasing w and the 
overlap region is well sampled by the forward work 
density compared with the reverse density. Espe- 



cially the "rare" events w < Af of forward direc- 
tion are much more available than the rare events 
w > Af of reverse direction. Hence, more or less 
typical events of one direction together with atypi- 
cal events of the other direction are responsible for 
the behavior of the convergence measure. Likewise 
if both densities increase with w. 

3. More generally, the work densities are some kind of 
interpolation between the above two cases. 

4. Finally, there remain some exceptional cases. For 
instance, if the forward and reverse work densities 
have different support or if they do not obey the 
fluctuation theorem at all. 

With respect to the exceptional case, the convergence 
measure fails to work, since it requires that the forward 
and reverse work densities have the same support and 
that the densities are related to each other via the fluc- 
tuation theorem ([T|). 

In all other cases, the convergence measure certainly 
will work and will show a similar behavior, regardless 
of the detailed nature of the densities. This can be 
explained as follows. In the preceding subsections, we 
have investigated exponential and Gaussian work densi- 
ties, two examples that differ in their very nature. While 
exponential work densities cover case number two, and 
Gaussians cover case number one, they show the same 
characteristics of a. This means that the characteristics 
of the convergence measure are insensitive to the individ- 
ual nature of the work densities as long as they have the 
same support and obey the fluctuation theorem. 

To this end, we want to point to some subtleties in the 
text of the actual paper. While the measure of conver- 
gence is robust with respect to the nature of work densi- 
ties, some heuristic or pedagogic explanations in the text 
are written with regard to the typical case number one, 
where the overlap region is sampled by rare events, only. 
This concerns mainly Sec. |TT] where we speak about ef- 
fective cut-off values in the context of the neglected tail 
model. These effective cut-off values would become void 
if we would try to explain the bias of exponential work 
densities qualitatively via the neglected tail model. Also 
the explanations in the text of the next section are mainly 
focused on the typical case number one. This concerns 
the passages where we speak about rare events. Never- 
theless, the main and essential statements are valid for 
all cases. 

The most important property of a is its almost simul- 
taneous convergence with the free energy estimator Af 
to an d priori known value. This fact is used to develop 
a convergence criterion in the next section. 



V. THE CONVERGENCE CRITERION 

Elaborated the statistical properties of the convergence 
measure, we are finally interested in the convergence of 
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a single free energy estimate. In contrast to averages 
of many independent running estimates, estimates based 
on individual realization are not smooth in see e.g. 

Fig.m 

For small N, typically U^"^ underestimates Ua more 
than Ua does, pushing a close to its upper bound. With 
increasing N, A/ starts to "converge" ; typically in a non- 
smooth manner. The convergence of A/ is triggered by 
the occurrence of rare events. Whenever such a rare event 
in the important tails of the work densities gets sampled, 
A/ jumps, and between such jumps. A/ stays rather 
on a stable plateau. The measure a is triggered by the 
same rare events, but the changes in a are smaller, unless 
convergence starts happening. Typically, the rare events 
that bring A/ near to its true value are the rare events 
which change the value of a drastically. In the typical 
case, these rare events let a even undershoot below zero, 
before A/ and a finally converge. 

The features of the convergence measure, 

1. it is bounded, a £ (—1, 1 — Ua], 

2. it starts for small N at its upper bound, 

3. it converges to a known value, a — > 0, 

4. and typically it converges almost simultaneously 
with A/, 

simplify the task of monitoring the convergence signif- 
icantly, since it is far easier to compare estimates of a 
with the known value zero than the task of monitoring 
convergence of A/ to an unknown target value. The 
characteristics of the convergence measure enable us to 
state: typically, if a is close to zero. A/ has converged. 

Deviations from the typical situation are possible. For 
instance. A/ may not show such clear jumps, neither may 
a. Occasionally, A/ and a, may also fluctuate exceed- 
ingly strong. Thus, a single value of a close to zero does 
not guarantee convergence of the free energy estimate as 
can be seen from some few individual events in the scatter 
plot of Fig. [14] that fail a correct estimate while a is close 
to zero. A single random realization may give rise to a 
fluctuation that brings a close to zero by chance, a fact 
that needs to be distinguished from a having converged 
to zero. The difference between random chance and con- 
vergence is revealed by increasing the sample size, since 
it is highly unlikely that a stays close to zero by random. 
It is the behavior of a with increasing iV, that needs to 
be taken into account in order to establish an equivalence 
between a — and A/ A/. 

This allows us to state the convergence criterion: 

if a fluctuates close around zero, convergence is as- 
sured, 

implying that if a fluctuates around zero. A/ fluctuates 
around its true value A/, the bias vanishes, and the mean 



1 w«ooooo< - 




total sample size N 



FIG. 16: Running estimates of the excess chemical potential 
fi"^^ in dependence of the sample size N {a — 0.9). The inset 
displays the corresponding values of the convergence measure 
a. 



square error reaches its asymptotics which can be esti- 
mated using Eq. (jSj. a fluctuating close around zero 
means that it does so over a suitable range of sample 
sizes, which extends over an order of magnitude or more. 



VI. APPLICATION 

As an example, we apply the convergence criterion to 
the calculation of the excess chemical potential /i^^ of 
a Lennard Jones fluid. Using Metropolis Monte Carlo 
simulation |34| of a fluid of Np particles, the forward 
work is deflned as energy increase when inserting at ran- 
dom a particle into a given configuration j35j . whereas 
the reverse work is defined as energy decrease when a 
random particle is deleted from a given Np + 1-particle 
configuration. The densities Pq{w) and pi{w) of forward 
and reverse work obey the fluctuation theorem ([1]) with 
A/ = fi'^^ . Thus, Bennett's acceptance ratio method 
can be applied to the calculation of the chemical poten- 
tial. 

Details of the simulation are reported in Ref. [20| . 
Here, the parameter values chosen read: Np = 120, 
reduced Temperature T* — 1.2, and reduced density 
p* = 0.5. 

Drawing work values up to a total sample size of 10^ 
with fraction a — 0.9 of forward draws (which will be 
a near-optimal choice [l^), the successive estimates of 
the chemical potential together with the corresponding 
values of the convergence measure are shown in Fig. 
The dashed horizontal line does not show the exact value 
of /x*^^ , which is unknown, but rather the value of the last 
estimate with N = 10^. Taking a closer look on the be- 
havior of the convergence measure with increasing N, we 
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FIG. 17: Statistics of estimates of the excess chemical poten- 
tial: shown are the average value and the standard deviation 
(as errorbars) in dependence of the sample size A''. The statis- 
tics of the corresponding values of the convergence measure 
is shown in the inset. 

observe a near unity for TV < 10^, indicating the low N 
regime and the lack of observing rare events. Then, a 
sudden drop near to zero happens at iV = 10^, which co- 
incides with a large jump of the estimate of /x*^^, followed 
by large fluctuations of a with strong negative values in 
the regime N — 10^ to 10^. This behavior indicates that 
the important but rare events which trigger the conver- 
gence of the fi'^^ estimate are now sampled, but with 
strongly fluctuating relative frequency, which in specific 
cases causes the negative values of a (because of too many 
rare events!). Finally, with N > 10^, a equilibrates and 
converges to zero. The latter is observed over two orders 
of magnitude, such that we can conclude that the latest 
estimate of fj,^^ with N = 10^ has surely converged and 
yields a reliable value of the chemical potential. The con- 
fidence interval of the estimate can safely be calculated 
as the square root of Eq. (|6]) (one standard deviation), 
and we obtain explicitly fi*^^ = —2.451 ± 0.005. 

Interested in the statistical behavior of a for the 
present application, we carried out 270 simulation runs 
up to iV = 10'* to obtain the average values and standard 
deviations of /x"^^ and a which are depicted in Fig. [ITl 



The dashed line marks the same value as that in Fig. [TB] 
Again, we observe the same qualitative behavior of a as 
in the foregoing examples of Sec. |Vl especially a posi- 
tive average value of (a) and a convergence to zero which 
occurs simultaneously with the convergence of Bennett's 
acceptance ratio method. 

VII. CONCLUSIONS 

Since its formulation a decade ago, the Jarzynski equa- 
tion and the Crooks fluctuation theorem gave rise to en- 
forced research of nonequilibrium techniques for free en- 
ergy calculations. Despite the variety of new methods, in 
general little is known about their statistical properties. 
In particular, it is often unclear whether the methods 
actually converge to the desired value of the free energy 
difference A/, and if so, it remains in question whether 
convergence happened within a given calculation. This is 
of great concern, as usually the calculations are strongly 
biased before convergence starts happening. In conse- 
quence, it is impossible to state the result of a single cal- 
culation of A/ with a reliable confidence interval unless 
a convergence measure is evaluated. 

In this paper, we presented and tested a quantitative 
measure of convergence for two-sided free energy esti- 
mation, i.e. Bennett's acceptance ratio method, which 
is intimately related to the fluctuation theorem. From 
this follows a criterion for convergence relying on mon- 
itoring the convergence measure a within a running es- 
timation of A/. The heart of the convergence criterion 
is the nearly simultaneous convergence of the free energy 
calculation and the convergence measure a. Whereas the 
former converges towards the unknown value A/, which 
makes it difficult or even impossible to decide when con- 
vergence actually takes place, the latter converges to an 
d priori known value. If convergence is detected with 
the convergence criterion, the calculation results in a re- 
liable estimate of the free energy difference together with 
a precise confidence interval. 



Appendix A 

The derivation of inequality (ITU)) relies on the close 
connection between the overlap Ua and the overlap area 
A, 



Ua^ — Q cgw > / - — ^dw = / mm{po,pi}dw ^ A, (Al) 

J apo + ppi J {a + fi)max{po,Pi} J 

= ^/ i/p +i/p < ^/ min{po,Pi}du' = '2A. (A2) 
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Together with the inequahty ^X{N,^) < X(N,a) of Bennett [2I], we obtain 



1-2A 1 - 1 , 1, , , 11/1 

< 1 = 7:X{N,-) < X(N,a) < ——:(-r - 1 

^ 2 



NA 



(A3) 



which directly yields inequality (flOl 



Appendix B 

Inequality (l?T|) can be obtained as follows. Noting that tj^w) < ^ and bc{'w) < ^, cf. Eqs. (ITil) we have 
2f/„ = tij''' + b^^'" > atfT*" + = t/<"' and further 



(Bl) 
(B2) 



which results in Eq. ([21 



Appendix C 



Appendix D 



The errorbars in Figs. [T] and [16] are obtained via the 
error-propagation formula for the variance of Bennett's 
acceptance ratio method. 

A possible estimate a^^ of the variance of the two-sided 
free energy estimator obtained from error-propagation 
reads 



1 



t 



-(1)2 



A/ 



1 bj 



-(0) 



-(0)^ 



t-A/ 



no 



(CI) 



Alternatively, a^p can be expressed through the overlap 

estimates Ua and C/^"' of first and second order, Eqs. (fTB)) 
and (fT9t. 



1 (7< 



ep 



t/2 



(C2) 



In the limit of large N, converges to the asymptotic 
mean square error X{N, a), Eq. ([6]). An upper bound on 
(Tgp follows from inequality ([23l) : 



CT^ < 2 



(C3) 



Finally let us mention that the convergence measure a, 
Eq. (I20I) , is closely related to the relative difference of the 
estimated asymptotic mean square error X, Eq. Q, and 



Consider the family 0(c) of A/ estimators, 
parametrized by the real number c [23l |: 



(c) = c + ln=^. 

Or. 



(Dl) 



For any fixed value of c, 0(c) defines a consistent estima- 
tor of A/, 0(c) ^^zi^ A/ Vc. For finite TV, however, the 
performance of the estimator strongly depends on c. The 
(optimal) two-sided estimate ([2|) is obtained by the ad- 
ditional condition 0(c) — c, such that tc*^' — be holds, 
and thus c — A/. A possible measure for the sensibility 
of the estimate 0(c) on c is it's derivative with respect to 



c. Using -§^t, 
we obtain 



-Ptcbc, 



dc'- 



atcbc, and ate + /3^c = 1; 



to be 



(D2) 



Taking the derivative at c = A/ directly results in the 
convergence measure a. 



a - (1 - C/„) 



(C4) 



A/ 



(D3) 
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